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Abstract 

Background: Meat quality involves many traits, such as marbling, tenderness, juiciness, and backfat thickness, all of 
which require attention from livestock producers. Backfat thickness improvement by means of traditional selection 
techniques in Canchim beef cattle has been challenging due to its low heritability, and it is measured late in an 
animal's life. Therefore, the implementation of new methodologies for identification of single nucleotide 
polymorphisms (SNPs) linked to backfat thickness are an important strategy for genetic improvement of carcass and 
meat quality. 

Results: The set of SNPs identified by the random forest approach explained as much as 50% of the deregressed 
estimated breeding value (dEBV) variance associated with backfat thickness, and a small set of 5 SNPs were able to 
explain 34% of the dEBV for backfat thickness. Several quantitative trait loci (QTL) for fat-related traits were found in 
the surrounding areas of the SNPs, as well as many genes with roles in lipid metabolism. 

Conclusions: These results provided a better understanding of the backfat deposition and regulation pathways, 
and can be considered a starting point for future implementation of a genomic selection program for backfat 
thickness in Canchim beef cattle. 
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Background 

Beef cattle production in Brazil is based on several 
breeds, depending on the geography and climate of a 
given area. Breeds based on Bos taurus are commonly 
raised as livestock for beef in the South of Brazil, but in 
most parts of the country, beef cattle production is 
based on Bos indicus (zebu) breeds raised on natural 
pastures. A good description of Brazilian beef cattle pro- 
duction was recently published [1]. Zebu breeds are con- 
sidered highly adapted to the tropical environment in 
Brazil [2-5], but they are known for their lower meat 
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quality in certain aspects, such as tenderness, palatabil- 
ity, and marbling [6-10], and for their lower 
reproduction efficiency [11,12] when compared to Bos 
taurus. The Canchim (3/8 zebu + 5/8 Charolais) breed 
was developed in the early 1960s in Brazil [13] with the 
intention of combining fitness traits from zebu to the 
higher reproduction efficiency and meat quality from the 
Charolais breed. 

Although the Canchim breed has fared well when 
raised on natural pastures in Brazil, some carcass traits 
have still remained inferior when compared to Bos 
taurus. One such trait is backfat thickness, which has 
been a concern for Canchim producers, and for the beef 
cattle industry in general, due to its low fat deposition 
in animals raised on pasture (1.90mm ± 0.77) [14]. 
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Improvement of this trait in Canchim beef cattle using 
traditional selection techniques has had limited success 
because of its relatively low heritability (0.23) [14], and 
because it is measured late in an animals life. Most stud- 
ies available in the literature regarding backfat thickness 
have been conducted on animals raised in feedlot sys- 
tems, which permits earlier ultrasound measurements, 
and has also shown moderate to high heritabilities 
[15-19], thereby allowing traditional selection techniques 
under these conditions to be more successful than com- 
pared to the Canchim breed. 

In attempts to improve meat quality, previous studies 
have focused on the identification of candidate markers 
associated with meat quality traits, as well as backfat 
thickness, in Canchim and other Bos indicus x Bos 
taurus crosses in Brazil [20-24]. However, these have 
had limited success, particularly in response to markers 
on the DDEFl and LEP genes [20,23]. Therefore, the 
identification of genetic markers linked to backfat thick- 
ness by novel methodologies is an important strategy for 
genetic improvement of carcass and meat qualities. One 
recently developed approach relies on examining how 
SNPs (single nucleotide polymorphisms) are associated 
with these qualitative traits [25]. More specifically, this 
method has been used successfully in studies that exam- 
ined fat-related traits, such as intramuscular fat percent- 
age, marbling, rib fat, backfat thickness and rump fat 
depth [26-35]. By the use of high-density SNP panel as- 
says for different breeds and crosses, these studies have 
collectively found such traits associated with regions on 
nine bovine chromosomes (6, 15, 17, 20, 21, 24, 25, 26, 
and 28) [27,28,32,35]. However, another study suggested 
that some of the effects attributed to each SNP can show 
variation based on the breed s origin, resulting from vari- 
ation in indicine and taurine- indicine composite cattle 
[35], thereby justifying the investigation of SNPs based 
on the breed of interest. 

A previous study using high-density SNP panel has as- 
sociated 100 SNPs to backfat thickness in a Canchim 
population using an approach that selected animals with 
extreme phenotypes for genotyping [33]. Those SNPs 
were located on several bovine autosomes, and from 
them, the authors further investigated and validated two 
regions on chromosome (chr) 14 associated with backfat 
thickness, where the haplotypes were responsible for 
0.24% to 1.1% of the phenotypic variance for this trait. 

Although these results are useful, it is well known that 
quantitative traits are polygenic as each SNP may account 
for only a small part of the phenotypic variance, therefore 
joint analysis of many SNPs has become a more interest- 
ing strategy [36,37]. This, however, exacerbates the large 
p, small n' problem faced by genome-wide studies, which 
means that there is a small number of phenotypes (n) to 
predict a large number of SNP (p) effects [38]. 



One solution to this problem is through the use of 
Random Forest, a machine learning algorithm capable 
of dealing with certain datasets for building model 
independent classification and/or regression problem 
predictors [39] . Specifically, it embeds a procedure of ac- 
counting for predictor variable importance, which re- 
sults in a score that can be used for prioritizing variables 
(SNPs), similar to p-values from statistical tests [40-42]. 
Because of these features, the variable importance of the 
random forest method has been recognized as an useful 
methodology for genome-wide association studies [43]. 

Considering all of the above, the objectives of this 
study were to identify SNPs associated with backfat 
thickness in Canchim beef cattle using the random for- 
est approach for genome-wide association studies, to 
shed insight on potential genes associated with this trait, 
and to discover potential SNPs for future implementa- 
tion of genomic selection (GS). The set of SNPs identi- 
fied by this methodology explains as much as 50% of the 
deregressed estimated breeding value variance associated 
with the observed phenotype. These results intend to pro- 
vide a better understanding of the backfat deposition and 
regulatory pathways, and to enable the use of the identified 
SNPs in validation studies for genomic selection. 

Methods 

Animals and phenotypes 

Animals used in this study were part of the Canchim 
Breeding Association from seven herds located in two 
Brazilian states (Sao Paulo and Goias). This research is 
in agreement with the ethical principles of animal ex- 
perimentation of Embrapa Southeast Livestock Ethical 
Committee of Animal Use (CEUA-CPPSE), and has been 
performed with the approval of CEUA-CPPSE under 
protocol number 02/2009. An initial sample of 987 
animals (males and females) was evaluated for backfat 
thickness by ultrasound in vivo over the 12th rib around 
the age of 18 months. All animals evaluated were born 
between 2003 and 2005 and raised on natural pastures. 

These 987 animals had the estimated breeding value 
(EBV) predicted by restricted maximum likelihood using 
the MTDFREML software [44]. The animal model in- 
cluded fixed effects of contemporary group (sex, year, 
herd, and genetic group) and age at measurement as a 
linear covariate, the additive genetic effect and error 
were included as random effects. From these animals, a 
sample of 400 was selected considering: EBV, accuracy, 
family size, and proportion between males (196) and 
females (204). These 400 animals were offspring of 50 
different sires (with 1 to 30 offspring per sire). 

Genotyping and SNPs quality control 

The selected 400 animals were genotyped using the 
BovineHD BeadChip (Illumina Inc., San Diego, CA). The 
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quality control filters included call rate (< 0.90) for sam- 
ples and SNPs, minor allele frequency (MAF < 0.01), and 
heterozygosity (< 3 standard deviations). After quality con- 
trol processing, 396 animals and 708,641 SNPs with an 
average call rate higher than 0.99 remained in the study. 

Genome-wide association analysis 

Genome-wide association (GWA) analysis was performed 
on deregressed EBVs (dEBV) [45], which takes into ac- 
count the pedigree matrix, estimated heritability (0.16, 
data not shown), EBVs, and EBVs accuracies obtained by 
the same animal model described above. For the estima- 
tion of dEBVs the data set was enhanced with data col- 
lected from animals born between 2005 and 2008 totaling 
1,648 individuals with phenotypes for backfat thickness, 
with 6,801 animals in the pedigree matrix. 

Association of SNPs to dEBVs was undertaken by a 
random forest package [46] available in the R-project 
software [47]. The association analysis was composed of 
a two-step procedure. In the first step, the SNPs with 
the highest 1% importance score by chromosome were 
selected, and in the second step, the outcome set of 
SNPs from the first step was re-analyzed disregarding 
the chromosome classification, and the SNPs with the 
highest 1% importance score were selected. For the asso- 
ciation analysis, the missing genotypes were imputed by 
the naive method provided in the random forest package 
(which imputes column median values for missing geno- 
types), the number of trees to grow and the number of 
randomly selected candidate SNPs at each split were set 
to 5,000 and 10% from the SNPs being evaluated, re- 
spectively. This procedure was done using the 396 sam- 
ples available. 

Taking into account the unbalanced offspring range 
among sires, 10 subsamples consisting of 198 animals 
each were also analyzed in the same two-step process as 
previously described. The 10 subsamples were selected 
as follows: i) The first animal was chosen at random 
from the 396 genotyped animals; ii) The next animal 
was selected based on the lowest relationship with the 
previous selected animal, but most representative from 
the rest of the genotyped animals; and iii) Step ii was re- 
peated until 198 animals were selected. 

Two approaches were considered for further SNP in- 
vestigation among the results obtained by the random 
forest analysis. One approach selected the SNPs in com- 
mon among the analysis with the 396 animals and the 
10 subsamples, called the Common SNPs strategy. An- 
other approach selected only the top 1% (importance 
score) from the analysis with 396 animals, called the 
Highest 1% SNPs. 

Finally, after both sets of SNPs (Common SNPs and 
Highest 1% SNPs) had been selected, each set of selected 
SNPs were fitted into a final stepwise regression model 



using SAS/STAT software [48] to estimate the amount 
of variance explained by the selected SNPs in the data 
set (final model values correspond to the dEBV vari- 
ance explained by the model, which are reported in 
Table 1). For doing so, the SNPs were coded as 0, 1, and 
2 for the A A, AB, and BB genotypes, respectively. In 
order to evaluate the significance of the results, a per- 
mutation test was conducted to estimate the bias associ- 
ated with the R^ obtained from the stepwise regression 
analysis. In the permutation test, the dEBV values were 
shuffled and then regressed to the same SNPs previously 
selected. The permutation test was repeated 1,000 times. 

Candidate genes and pathways 

A pathway analysis was conducted to characterize the 
genomic regions identified by the set of SNPs previously 
selected and to identify candidate genes influencing bio- 
logical functions and pathways related to backfat thick- 
ness and fat-related traits. 

The software fastPHASE version 1.4.0 [49] was used 
for reconstructing the haplotypes for each chromosome. 
Afterwards, the reconstructed haplotypes were analyzed 
by the software Haploview [50] (using default parame- 
ters) for estimating haplotype blocks and linkage dis- 
equilibrium (LD), which was calculated based on the 
squared correlation coefficient between SNP pairs (r^). 
Considering the extent of LD based on the overall aver- 
age r^ (average r^ = 0.12 at a distance of 250Kb, data not 
shown), a window of 500Kb (SNP position ± 250Kb) sur- 
rounding each SNP previously selected by the stepwise 
regression was considered to define the region used for 
candidate gene discovery and pathway annotation. 

The Cattle Genome Browser through the UMD 3.1 
Cattle genome assembly [51], was used for visualization 
of the selected SNPs and surrounding areas for 
localization and identification of QTLs, genes, and other 
interesting genomic landmarks. Other databases, such 
as the NCBI BioSystems database [52], and Kyoto 
Encyclopedia of Genes and Genomes (KEGG) [53,54] 
were also used for pathway annotation to gain insight 
into the biological processes involved in backfat thick- 
ness deposition. 

Results 

We performed regression analysis for both strategies 
(Common SNP and Highest 1% SNP), and the results 
were very similar in the final number of SNPs selected, 
and the percentages of dEBV variance explained by the 
final set of SNPs (Table 1) enabling the discussion to be 
focused on the set of 21 SNPs selected from the Highest 
1% SNP strategy due to its higher % of dEBV variance 
explained. Also, the first five SNPs (rsl33046994, 
rsl37294146, rsl09349988, rsl36717249, rsl34790147) 
in the regression model were the same and in the same 
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Table 1 Number of candidate and final SNPs selected through the Common SNP and Highest 1% SNP strategies 





Candidate 


Final Model 


% dEBV 


Permutation 




SNPs 


SNPs 


variance^^^ 


r2(2) 


Common SNP 


162 


19 


50.59% 


0.00 ± 0.02 


Highest 1% SNP 


70 


21 


53.27% 


0.00 ± 0.02 



^ dEBV deregressed estimated breeding value variance explained by the final fitting of SNPs. The % dEBV variance is the model from the final analysis which 
fits all SNPs as fixed effects into a regression analysis. 

^ Permutation R^: average values and standard deviations for R^ from 1,000 permutation tests. 



order for both strategies. These first five SNPs were re- 
sponsible for 34.13% of dEBV variance for backfat 
thickness. 

As a precaution against spurious artifacts that can 
result from splitting small samples into training and 
validation datasets, this was not performed here. An 
alternative option is to use a permutation test, which 
calculates the probability of obtaining a value more ex- 
treme than or equal to the observed value of a test stat- 
istic by shuffling the data and recalculating the test 
statistic. The proper test statistic for multiple regression 
is the coefficient of multiple determination, [55]. 

A permutation test was carried out to evaluate the 
probability of bias associated with the from the step- 
wise regression analysis (Table 1). The average R^ from 
1,000 permutation tests was 0.00 ± 0.02 for the Highest 
1% SNP strategy, showing that there is a small bias asso- 
ciated with the R^ from the stepwise regression analysis. 
However, this is very small when compared to the 
53.27% obtained from the Highest 1% SNP strategy, and 
therefore reinforces the significance of the results 
presented in Table 1. 

Table 2 shows the 21 SNPs selected by the stepwise re- 
gression, their chromosome, position, % of dEBV vari- 
ation explained by the SNP, genes annotated within ± 
250Kb, fat-related QTLs described in the current litera- 
ture, and references. Table 3 shows a summary of pathway 
annotation using the genes within ± 250Kb from the 21 se- 
lected SNPs using the KEGG [53,54] pathway database. 

Discussion 

The use of the random forest approach as a first step, to 
filter candidate SNPs without taking into consideration 
a statistical model specification, is advantageous in 
genome-wide association studies, as long as little is 
known about candidate areas and the genetic architec- 
ture of the specific trait. Furthermore, the fact that re- 
sults were obtained using two different strategies 
(Common SNPs and Highest 1% SNPs) and are very 
similar, provides reliability to the random forest method- 
ology as can be seen in the previous study [43]. 

With the exception of four selected SNPs in the 
Highest 1% SNPs strategy (chr 12: rsl36348926; chr 11: 
rsl 10833507; chr 2: rs42923911; chr 9: rsl 10025080), all 
other SNPs presented a fat-related QTL described in 



their chromosome region. Also, only one SNP on chr 3 
(rs42021729) is not close to any described gene in the 
surrounding area (± 250kb) (Table 2). 

In a previous genome-wide association study in 
Canchim, 100 SNPs on several chromosomes were con- 
sidered the optimal set of SNPs to differentiate the 30 
individuals with extreme phenotypes for backfat thick- 
ness. Among these SNPs, two haplotypes on chr 14 were 
genotyped and their association to the phenotype was 
validated in the whole population [33]. In the current 
study, even though SNPs from chr 14 were associated 
with backfat thickness by the random forest approach 
(in the Common SNP and Highest 1% SNP strategies, 
data not show), these SNPs were not selected in the 
stepwise regression model. Conflicting results and/or 
studies that cannot be replicated in the post-genomic 
area are not so uncommon [56-59], and these differences 
can be attributed to partially insufficient power, false- 
positive results, bias, sample size, and to differences in 
populations, controls, and methodologies [56-58], or 
true heterogeneity associations [56]. In these two GWA 
studies with Canchim, the base population is very simi- 
lar, but the sample size and methodologies are not, 
which could explain the difference in the findings. A fu- 
ture option to help clarify the inconsistency in these 
findings would be to perform a meta-analysis, which 
combines data together to increase sample size and 
power, while reducing error risks [58,60]. 

Another outcome from this study and the previous 
one [33] is the possibility of including these SNPs in the 
development of a low density SNP (LD-SNP) panel for 
implementation of genomic selection in Canchim beef 
cattle. The most widespread strategy for developing 
small panels is by applying methods of variable selection 
to identify a diminutive set of SNPs that have good pre- 
dictive power for the trait or breeding value [61]. The in- 
crease in accuracy of genomic breeding values obtained 
by using LD-SNP panels can be highly similar (around 
90%) compared to the accuracies obtained by high dens- 
ity panels [62,63], at a more cost-effective price. There- 
fore, it is more likely to be adopted by farmers and the 
beef industry [64]. Furthermore, LD-SNP panels devel- 
oped with SNPs selected on the basis of their effects 
perform better than LD-SNP panels with SNPs evenly 
spaced [62,63]. Importantly, SNPs identified in these 
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Table 2 Summary of information available for the Highest 1% SNPs selected by the stepwise regression 



dbSNP' 


Chr^ 


Position 


% dEBV^ 


Genes'^ in ± 250Kb 


Fat-related QTL^ 


QTL reference 


rs 133046994 


10 


18129602 


1 1.12 


THSD4, LRRC49 


SF, MS 


[65,66] 


rsl 372941 46 


1 


132385787 


9.41 


S0X]4, ClDmS, DZPIL 


FT12R, IF 


[29,71] 


rs 109349988 


3 


15814096 


5.21 


KCNN3^ EFNA3, EFNA4, DCST2, LOO 00294774, PMVK, 
ADAR, CHRNB2, ADAM! 5, ZBTB7B, LOCI 00294857, DCSTl, 
FLADl, PYG02, CKSIB, PBXIPl, SHCl, LOCI 00294894 


FT12R, MS 


[66] 


rsl 3671 7249 


19 


37969870 


4.88 


B4GALNT2, GNGT2, ABI3, PHOSPHOl, 
ZNF652, NGFR, PHB, IGF2BP1, GIP 


OAC, PAC 


[84,85] 


rsl 34790 147 


13 


20780821 


3.51 


CCDC7, ARL5B, MGC152301, LOC100848675, 
LOCI 00847992 


FT12R 


[29] 


rsl 36287610 


25 


42678992 


2.89 


FAM20C, LOC783396, LOC100300875, LOC100337322, 
rnlxAH 1 D, rULorA, rlLnuj, LUL/ojojz, LUL/ojyo I 


FT12R 


[29] 


rs 1 oDoyoDD/ 


1 1 


djd I yoyy 


Z.J 1 


LUL/oOOZ 1 , LUL 1 UU 1 JyozO, tlAAl 


r 1 1 Zn, IVIo 


Lzyj 


rs4 1 /yuooy 


\ 0 


yyuzjj 


Z.U/ 


UrlL, rntLr, rIVlUU, D/Uz, LUL/oy4 1 J, LUL./oyjyH, 

CHI3L1, MYBPH, ADORAl, LOCI 00847554 


r 1 1 zK, IVIo 


Lzyj 


rs42126516 


4 


52535108 


1.99 


TFEC LOC100296613, TES 


MS 


[29,97] 


rs42021729 


3 


64737352 


1.46 




MS 


[98] 


rsl 37001 098 


8 


95507919 


1.46 


SMC2, LOC100337180 


MS 


[29] 


rs43341824 


1 


50110036 


1.23 


LOC785980, ALCAM 


OAC, FT12R 


[85,99] 


rS4 1 Doo/jo 




ooz 1 y 1 U J 


1 .U4 


LALNdz, NjUNo, orLI, LUL 1 UUo4//U 




Lzy,ojj 


rs 1 oD34oyzD 


1 2 


1 UU434 1 U 


0.90 


1 r\r'~70ACiA c 
LuL/oo94d 






rsl 09869647 


3 


13195543 


0.72 


LOC100849046, LOC100848852, LOC784007, LOC783963 


MS 


[66] 


rsl 10833507 


11 


42856561 


0.69 


LOCI 00296234, LOCI 00296682, BCLllA 






rs4292391 1 


2 


12761205 


0.57 


LOC787311, LOCI 00848878 






rsl 356381 25 


10 


18147174 


0.55 


THSD4, LRRC49 


MS, SF 


[65,66] 


rsl 10607520 


9 


96622647 


0.66 


SYTL3, TULP4, TMEM181, EZR, LOC781263, DYNLTl, 
RSPH3, LOC782714, TAGAP, LOC782637 


MS 


[29] 


rsl 10025080 


9 


11710300 


0.52 


RIMSl 






rsl 09697559 


2 


61906393 


0.58 


LOCW0847709, LOC100297008, LCT, UBXN4, 
MCM6, DARS, R3HDM1, MIR128-1 


MS 


[29] 



^ Reference SNP cluster report. 
^ Chromosome in B. taurus. 

^ % dEBV variance is the model for each of the SNPs in the final analysis which fits all SNPs as fixed effects into a regression analysis. 
Gene symbol. 

^ SF subcutaneous fat, FT12R fat thickness at the 12th rib, IF intramuscular fat, FT" fat thickness, MS marbling score, OAC oleic acid content, PAC palmitoleic 
acid content. 



studies need to undergo a prior validation in a population 
of animals which are not included in the population used 
for the SNP discovery (training population), enabling con- 
fidence in genomic predictions for future populations. 

From the SNPs identified in this study, there were two 
on chr 10 (rsl33046994, rsl35638125) associated with 
backfat thickness, which together accounted for almost 
12% of the dEBV variation (Table 2), These two SNPs 
are in the same chromosomal region as fat-related QTLs 
identified in previous studies [65,66], and they map to 
the same genes {THSD4 - thrombospondin, type I, 
domain containing 4, and LRRC49 - leucine-rich repeat- 
containing protein 49) thereby indicating THSD4, 
LRRC49 and the surrounding areas as strong candidates 
for further investigations and validation. The LRRC49 
gene has been linked to breast cancer in humans, but 



very little is known about the biological function of the 
protein encoded by this gene [67]. 

The THSD4 gene in Bos taurus and in Homo sapiens 
has a provisional status from RefSeq [68], which, by defin- 
ition, supports that this gene is both transcribed and 
expressed. Further evidence for the annotation of this gene 
is given by its sequence identity in the UniGene database 
[52] when compared to orthologous sequences from M. 
musculus (95.1%), which has a validated status in RefSeq, 
and to H, sapiens (93.1%), suggesting a well-conserved 
homology of the THSD4 gene in these species. 

The THSD4 gene encodes a protein with conserved 
disintegrin and metalloprotease domains, which it shares 
with the ADAM-TSl protein family, and plays an import 
role in adipogenesis [69]. Previous studies have shown 
that this protein family interferes with the availability of 
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Table 3 Summary of pathway description from the KEGG Pathway Database 



Global Pathway Subpathway Pathway Gene SNP 



M6t3bolism CsrbohydrstB M6t3bolism 


Galactose metabolism 


LL / 


rcl nQAQ7^c;Q 
IS 1 UyOy / DDy 




Amino sugar and nucleotide 
sugar metabolism 


CHI3L1 


rs41 790889 


Lipid Metabolism 


Glycerophospholipid metabolism 


rnUjrnU 1 


rs 1 DO/ \ /z4y 


Metabolism of Terpenoids 
and Polyketides 


Terpenoid backbone biosynthesis 


PMVK 


rs 109349988 


Metabolism of Cofactors 
and Vitamins 


Riboflavin metabolism 


PI Am 
rLAU 1 


rcl nQQ/IQQQQ 


Genetic Information Replication and Repair 
Processing 


DNA replication 


IVIUViO 


rs 1 (jyoy/bby 


Folding, Sorting 
and Degradation 


RNA degradation 


BTG2 


rs4 1 /yuooy 


Translation 


Aminoacyl-tRNA biosynthesis 


DARS 


rsl 09697559 


Environmental Information Signal Transduction 
Processing 


MARK signaling pathway 


CACNB2, PDGFA 


rs41 683753, rsl 36287610 




ErbB signaling pathway 


SHCl 


rsl 09349988 


Signaling Molecules 

dllU II ILcldLLIUI 1 


Cell adhesion molecules (CAMs) 


CLDN18, ALCAM 


rsl 372941 46, rs43341824 




IMcUl UclLLI Vc liydllU IcLcpLUI II 1 Lcl dLLIUI 1 


CHRNR? GIP 
K^nniMDZ, \jir, 

ADORAl 


r<;inQ^4QQRR r<;1 ^(^71 774Q 

rs41 790889 




Cytokine-cytokine receptor interaction 


NGFR, PDGFA 


rsl 3671 7249, rsl 36287610 


Cellular Processes Cell Motility 


Regulation of actin cytoskeleton 


FZR, PDGFA 


rsl 10607520, rsl 36287610 


Cell Growth and Death 


Cell cycle 


MCM6 


rsl 09697559 




Apoptosis 


PRKARIB 


rsl 36287610 


Cell Communication 


Tight junction 


CLDN18 


rsl 372941 46 




Focal adhesion 


SHCl, PDGFA 


rsl 09349988, rsl 36287610 




Gap junction 


PDGFA 


rsl 36287610 


Transport and Catabolism 


Peroxisome 


PMVK 


rsl 09349988 


Organismal Systems Circulatory System 


Cardiac muscle contraction 


CACNB2 


rs41 683753 


Immune System 


Leukocyte transendothelial migration 


FZR, CLDN18 


rsl 10607520, rsl 372941 46 




Chemokine signaling pathway 


GNGT2, SHCl 


rsl 3671 7249, rsl 09349988 




Cytosolic DNA-sensing pathway 


ADAR 


rsl 09349988 




Natural killer cell mediated cytotoxicity 


SHCl 




Digestive System 


Gastric acid secretion 


FZR 


rsl 10607520 




Carbohydrate digestion and absorption 


LCT 


rsl 09697559 


Nervous System 


Glutamatergic synapse 


GNGT2 


rsl 3671 7249 




GABAergic synapse 


GNGT2 


rsl 3671 7249 




Cholinergic synapse 


GNGT2, CHRNB2 


rsl 3671 7249, rsl 09349988 




Dopaminergic synapse 


GNGT2 


rsl 3671 7249 




Serotonergic synapse 


GNGT2 


rsl 3671 7249 




Retrograde endocannabinoid signaling 


GNGT2, RIMSl 


rsl 3671 7249, rsl 10025080 




Synaptic vesicle cycle 


RIMSl 


rsl 10025080 




Neurotrophin signaling pathway 


SHCl, NGFR 


rsl 09349988, rsl 3671 7249 


Development 


Axon guidance 


FFNA3, FFNA4 


rsl 09349988 


Endocrine System 


Insulin signaling pathway 


SHCl, PRKARIB 


rsl 09349988, rsl 36287610 
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Table 3 Summary of pathway description from the KEGG Pathway Database (Continued) 



Human Diseases Cardiovascular Diseases 


Hypertrophic cardiomyopathy (HCM) 


CACNB2 


rs41 683753 




Arrhythmogenic right ventricular 
cardiomyopathy (ARVC) 


CACNB2 


rs41 683753 




Dilated cardiomyopathy (DCM) 


CACNB2 


rs41 683753 


Infectious Diseases 


Pathogenic Escherichia coli infection 


EZR 


rsl 10607520 




Hepatitis C 


CLDN18 


rsl 37294146 




Measles 


ADAR 


rsl 09349988 




Influenza A 


ADAR 


rsl 09349988 




Bacterial invasion of epithelial cells 


SHCl 


rsl 09349988 




HTLV-I infection 


PDGFA 


rsl 36287610 


Substance Dependence 


Morphine addiction 


GNGT2, ADORAl 


rsl 3671 7249, rs41 790889 




Nicotine addiction 


CHRNB2 


rsl 09349988 




Alcoholism 


GNGT2, SHCl 


rsl 3671 7249, rsl 09349988 


Cancers 


Pathways in cancer 


CKSIB, PDGFA 


rsl 09349988, rsl 36287610 




Small cell lung cancer 


CKSIB 


rsl 09349988 




Glioma 


SHCh PDGFA 


rsl 09349988, rsl 36287610 




Chronic myeloid leukemia 


SHCl 


rsl 09349988 




Transcriptional misregulation in cancers 


NGFR, PDGFA 


rsl 3671 7249, rsl 36287610 




Melanoma 


PDGFA 


rsl 36287610 




Prostate cancer 


PDGFA 


rsl 36287610 



differentiation-inducing or differentiation-inhibiting gro- 
wth factors, either by modifying the extracellular matrix, 
affecting cell migration and adhesion, or by activating 
other pathways, which are key for regulating the differ- 
entiation of adipocytes, allowing their growth and ex- 
pansion during adipogenesis [70]. 

The subcutaneous fat percentage QTL reported on chr 
10 (Table 2) is from a Charolais x Holstein crossbred cat- 
tle population, and is described as highly significant with 
additive effects estimated to be 0.5 phenotypic standard 
deviation units [65]. The study also reveals that the 
Charolais allele was associated with higher fat levels. 

The SNP on chr 1 (rsl37294146) associated with 
backfat thickness is responsible for approximately 9.4% 
of the dEBV variation (Table 2). There is also a reported 
QTL for fat thickness over the 12th rib [29] and another 
for intramuscular fat percentage [71], indicating that 
there should be one or more genes in this area affecting 
fat metabolism. In the 500Kb window surrounding this 
SNP, three genes are annotated, SOX 14 (sex determining 
region Y - box 14), CLDN18 (claudin 18), and DZIPIL 
(DAZ interacting protein 1-like). The S0X14 gene seems 
to be involved in the regulation of embryonic develop- 
ment, whereas CLDN18 belongs to a multigene family 
that encodes a tetraspanning membrane protein acting 
on components at tight junctions, but its regulatory 
mechanisms, and roles in physiology and pathology are 
still under investigation [72]. The DZIPIL gene encodes 



a zinc finger protein, but how it affects either adipo- 
genesis or lipid metabolism has not been depicted from 
the current literature. Nonetheless, the functions of 
these gene products are still being elucidated. 

The 500Kb window around the SNP on chr 3 
(rsl09349988) reveals many annotated genes, of which 
some have been reported as participating in lipid metab- 
olism. For example, PMVK (phosphomevalonate kinase) 
catalyzes the conversion of mevalonate 5-phosphate with 
ATP to form mevalonate 5-diphosphate and ADP, which 
is one of the initial reactions involved in the cholesterol 
biosynthetic pathway [73]. Other proteins in this region 
include ADAR (adenosine deaminase, RNA-specific), 
which encodes an RNA-editing enzyme by site-specific 
deamination of adenosines, resulting in changes in pro- 
tein function or gene expression. A study in humans was 
conducted that found ADAR enzymes were associated 
with serum triglyceride and adiponectin levels, abdom- 
inal circumference, and body mass index [74]. Interest- 
ingly, this region also contains SHCl (Src homology 2 
domain containing - transforming protein 1) which has 
been reported as having a role in human obesity [75], 
and as being one of the mediators for regulating the 
insulin-like growth factor 1 (IGF-l) pathway, which plays 
a key role in regulating cell proliferation, differentiation 
and apoptosis [76]. Lastly, this region contains ADAM15 
(ADAM metallopeptidase domain 15), which belongs to 
the ADAM protein family previously discussed. These 
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studies corroborate our findings and require further in- 
vestigation to elucidate how these genes are affecting the 
deposition of subcutaneous fat in bovines. 

The SNP associated with backfat thickness on chr 19 
(rsl36717249) is responsible for approximately 4.88% of 
the dEBV variance. This region contains the PHOSPHOl 
(phosphatase, orphan 1) gene, which encodes a phos- 
phatase enzyme that has been implicated in the 
mineralization of the extracellular matrix, a key process 
for skeletal development [77]. The PHOSPHOl gene 
product has high activities toward phosphoethanolamine 
(PEA) and phosphocholine (PCho) [78], which are the 
main metabolites involved in the pathway for the 
formation of phosphatidylcholine and phosphatidyletha- 
nolamine [79]. These compounds are implicated in the 
metabolism of complex glycerolipids, prostaglandins, 
leukotrienes, glycosylphosphatidylinositol-anchors, and 
some amino acids, such as glycine, serine and threonine. 
Also included in this region is the PHB gene (prohibitin), 
which is thought to be involved in regulating cell prolif- 
eration, gene transcription, and apoptosis. In recent 
studies, deficient PHB activity in the liver has been asso- 
ciated with non-alcoholic steatohepatitis and obesity, 
although the mechanism remains unknown [80,81]. 
Other examples include the IGF2BP1 (insulin-like 
growth factor 2 mRNA binding protein 1) gene, which 
encodes a protein that binds to the mRNAs of certain 
genes and regulates their translation. Lastly, the GIP 
(gastric inhibitory polypeptide, also known as the 
glucose-dependent insulinotropic polypeptide) gene has 
a known effect on stimulating the release of insulin from 
pancreatic |3 cells, but also has an insulin-like effect on 
adipocytes, suggesting that the GIP gene product en- 
hances adipocyte glucose uptake, and that, at least in 
humans, it has an important role in the development of 
nutrition-induced obesity [82]. A recent study suggests 
that the GIP gene product has an effect on reducing free 
fatty acid release from adipose tissues, either by increas- 
ing reesterification or by inhibition of lipolysis [83]. 
Indeed, QTL studies reveal oleic acid content (OAC) 
and palmitoleic acid content (PAC) QTLs [84,85] in 
close proximity to the GIP gene in the bovine genome, 
which further suggests an association between this gene 
and free fatty acid processing. 

The SNP rsl34790147 on chr 13 also was associated 
with backfat thickness, and it is carrying 3.51% of the 
dEBV. Within this SNP region, a QTL for fat thickness 
over the 12th rib was found and described in an Angus 
population [29]. Also, a set of four genes are localized in 
the ±250kb window from the SNP position. The CCDC7 
gene (coiled-coil domain containing 7) seems to be asso- 
ciated with human cancer [86,87], and there is no infor- 
mation available for bovines. The ARL5B gene product 
(ADP-ribosylation factor-like 5B), also known as ARL8, 



belongs to a family of proteins that show similar struc- 
ture to ADP-ribosylation factors {ARFs family). ARLs 
and ARFs belong to the RAS superfamily of small 
GTPases, which function as modulators of complex and 
diverse cellular processes [88,89], of which the most ca- 
nonical are cell proliferation and differentiation. How- 
ever, they are also involved in protein trafficking through 
the trans-Golgi network (TGN). The TGN has a central 
role in protein sorting and directs the transport of newly 
synthesized proteins to different transport vesicles 
[90-92], and also receives recycled molecules and extra- 
cellular materials by retrograde transport. Recently, it 
was observed that ARL5B enhances retrograde transport 
from endosomes to the TGN [93]. The MGC152301 
(uncharacterized LOC783682) and the LOC524240 
(Alk-like) genes do not have any available information in 
terms of function of their gene products, but both show 
the same two conserved domains: cd00112 (LDLa) and 
cd06263 (MAM) [94]. The LDLa is a low density lipo- 
protein receptor class A domain, that plays an important 
role in mammalian cholesterol metabolism, the protein 
receptor binds LDL and transports it into the cell by 
endocytosis [95]. The MAM is an extracellular domain 
that mediates protein-protein interactions, and is found 
in a variety of proteins, of which many are known to 
function in cell adhesion [96]. The remaining 16 SNPs, 
which were not described in detail here, accounted for 
19.14% of dEBV variation for backfat thickness and, as 
seen in Table 2, most of them present some fat-related 
QTL described within their regions [29,65,66,85,97-99], 
and are of further interest for future investigations on 
how these SNPs can be influencing backfat thickness 
deposition in Canchim beef cattle. 

Conclusions 

In this study, we were able to identify a set of SNPs that 
correlates with approximately 50% of the deregressed es- 
timated breeding value variance for backfat thickness in 
Canchim beef cattle, which introduces the possibility of 
including these SNPs in the development of a low dens- 
ity SNP panel for future implementation of genomic se- 
lection program in Canchim beef cattle. We also have 
applied a new methodology using the Random Forest 
approach to identify novel gene candidates for improv- 
ing backfat thickness in Canchim beef cattle. In addition, 
although this study used backfat thickness as a target 
trait, other analyses of this type have successfully used 
other traits, thereby supporting the random forest ap- 
proach as a means of future investigations of livestock 
production traits. Lastly, some regions identified are not 
conspicuously associated with any specific genes. This 
suggests that they may be involved in as of yet unidenti- 
fied regulatory functions of gene expression or process- 
ing. Given the intrinsic complexity of biochemical 
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pathways, these regions and the genes within them merit 
a great deal of future investigations, specifically to how 
they correlate with backfat thickness deposition in 
Canchim beef cattle and to other breeds. 
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